State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs
نویسندگان
چکیده
Large vocabulary continuous speech recognition is always a difficult task, and it is particularly so for low-resource languages. The scenario we focus on here is having only 1 hour of acoustic training data in the “target” language. This paper presents work on a data borrowing strategy combined with the recently proposed Subspace Gaussian Mixture Model (SGMM). We developed data borrowing strategies based on two approaches: one based on minimizing K-L Divergence, and one that also takes into account state occupation counts. We demonstrate improvements versus the baseline SGMM setup, which itself is better than a conventional HMM-GMM system. The SGMMs are more robustly estimated by borrowing data from the non-target language at the acousticstate level. Although we tested the approach for SGMMs, we expect the general idea of borrowing data from a non-target language to be applicable for conventional GMMs as well.
منابع مشابه
Robust Estimation and Adaptation of Subspace Gaussian Mixture Models for Automatic Speech Recognition
In conventional hidden Markov model (HMM) based speech recognisers, the emitting HMM states are modelled by Gaussian Mixture Models (GMMs), with parameters been estimated directly from the training data. However, in Subspace Gaussian mixture model(GMM) based acoustic modelling, the parameters of each state model are derived from the globally shared model subspaces which are normally low dimensi...
متن کاملCanonical state models for automatic speech recognition
Current speech recognition systems are often based on HMMs with state-clustered Gaussian Mixture Models (GMMs) to represent the context dependent output distributions. Though highly successful, the standard form of model does not exploit any relationships between the states, they each have separate model parameters. This paper describes a general class of model where the context-dependent state...
متن کاملFeature and Score Level Combination of Subspace Gaussians in Lvcsr Task
In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outpu...
متن کاملSubspace Gaussian Mixture Models for Automatic Speech Recognition
In most of state-of-the-art speech recognition systems, Gaussian mixture models (GMMs) are used to model the density of the emitting states in the hidden Markov models (HMMs). In a conventional system, the model parameters of each GMM are estimated directly and independently given the alignment. This results a large number of model parameters to be estimated, and consequently, a large amount of...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کامل